Deals of a feather… Modelling latent classes in R&D collaboration data using finite mixture analysis

This work explores if behaviour-based asymmetries are likely to impact deal valuation in the life sciences by examining positive public sentiment as a proxy for market behaviour when negotiating under asymmetric conditions to examine heterogeneity in research & development collaboration (RDC) deal data. We use public sentiment as a proxy for behaviour along with stage of development-based RDC deal data to search for latent classes in the deal data using finite mixture modelling. The analysis reveals a nuanced picture: public sentiment emerges as a significant predictor of deal value, but only for approximately 15% of the data set. This subset exclusively includes firms in the Preclinical stage, where projects have moved past discovery but are yet to commence human studies. Interestingly, the research finds that sentiment’s impact on deal valuation is particularly pronounced in this stage, suggesting heightened market sensitivity. With recent research demonstrating that knowledge asymmetry and behaviour impact valuation volatility, we take this further by capturing latent classes within the data which demonstrates how behaviour is most influential in deal pricing considerations. We argue that our research demonstrates the impact of asymmetry and market behaviour on a subset of RDCs where products are known, but likelihood of success is difficult to determine.


Introduction
Policymakers and pharmaceutical partnering executives the world over are approaching the allocation of capital towards biotechnology research and development (R&D) with renewed vigour.The financial impacts of the COVID-19 pandemic continue to demonstrate the importance of pandemic preparedness in an ever-increasingly globalised economy [1].Capital allocation in the context of biotechnology product development usually happens through the mechanism of an R&D collaboration (RDC) [2].This mechanism allows for a stage-wise approach to product development, where a product is generally provided with enough funding to reach the next clinical milestone [3].At this point, the product is tested in a clinical setting, where a go/no-go decision on further development, and ultimately, further investment, is made.The gap in research can be found where valuation in this paradigm is challenging due to the inherent knowledge asymmetries found in these types of transactions, long drug development timelines (Drug development timelines vary drastically by therapeutic area, the allocation of capital and the urgency to bring the particular product to market.Whilst typical products can take *15 years to make it through the regulatory and development frameworks, the COVID-19 pandemic has demonstrated how we can speed development where the need is great. ..) and the inherently complex nature of scientific product development where human safety is of paramount concern [4][5][6].

Knowledge asymmetries and decision making
Knowledge asymmetries in this context are underpinned by the fact that trust, and ultimately understanding, between collaborating firms is not symmetrical.Where firms choose, or fail, to disclose for any myriad of reasons, commercial or otherwise, knowledge asymmetries will exist [7,8].Bias in these settings tends toward the survivor, with disclosures generally only coming from successful transactions [9].Knowledge asymmetries are also a natural part of the drug development process, where early-stage products lack the clinical and safety data required to inform the assumptions around safety and efficacy.In this context, disclosures increase as a product progresses through development, and thus, practitioners are better informed to make decisions around valuation [10].
Understanding the theory of bargaining under conditions of asymmetric information when considering RDCs offers a nuanced understanding of the strategic interplay between entities engaged in such partnerships.It illuminates the intricate dynamics that ensue when information is unevenly distributed among stakeholders, affecting decision-making processes and valuations in significant ways.A comprehensive exploration of the completeness of information available to parties involved in RDCs, including firms and investors, would reveal how disparities in knowledge can influence behaviour and valuation judgments.Asymmetric information shapes the bargaining landscape, this framework enriches our comprehension of the valuation mechanisms at play within RDCs.It underscores the critical role that information plays in the negotiation and decision-making processes, providing valuable insights into the complexities of valuing collaborative ventures in sectors characterised by high uncertainty and innovation [4,8,11].
Based upon the assumption that complex decisions such as those found in RDCs are heterogeneous in nature, we assume decision makers act with autonomy, seeking to guide decisionmaking behaviour through the examination of market sentiment as discussed in research by Lecouteux et al., 2015 andMardjo et al., 2022 [12, 13].With these assumptions in hand we examine if behaviour-based asymmetries and bias impact RDC valuation through the application of Finite Mixture Modelling (FMM) on RDC deal data.We use public sentiment analysis based upon micro-blogging (Twitter) as a proxy for market behaviour.We also examine if the knowledge asymmetries outlined are meaningful in RDC valuations.In section 2 we outline the models traditionally used to value RDC transactions.In section 3 we go on to describe the benefits of using a proxy for market behaviour when modelling RDC values.In sections 4 & 5 we propose our hypothesis and describe the data used for this research.Finally, we follow this with our methodology, results, and then we discuss our findings, their implications, limitations and how this research can be expanded through future inquiry.

R&D collaboration valuation
Valuation methodologies for biotechnology and pharmaceutical transactions abound and the assumptions of efficient market theory generally hold true where disclosures are full and transparent.In these cases, valuations can rely on information available to the market to inform the valuation process [14,15].Where disclosures are not forthcoming, we see that asymmetries and bias born from RDC deal survival exist.Methodologies have been developed to capture these biases and asymmetries and provide the tools needed to value biotechnology transactions where these limitations exist.Yet, whilst the ontologies are plentiful, the preponderance of valuation methodologies can be traced back to two core methods: binomial lattice (decision tree) or risk-adjusted net present value (rNPV).Where these methods fail to yield appropriate results, practitioners look to comparable deals to understand the market conditions for deal-doing [3,[16][17][18][19][20]].
An rNPV is described in Eq 1, where the methodology accounts for all the future revenues, risks, and costs associated with a biotechnology RDC.
P represents the project's payoff, R 0 captures the current risk or likelihood of success.R i represents the risk-adjusted at time i and C i R 0 R i captures the project's risk-adjusted costs associated with the project.An rNPV is equivalent to an RDCs time cost of money over the project's life cycle, which is otherwise analogue to an IRR, and project-specific risk at that point in the project (R 0 ).Compounding risks throughout the project culminate early in the project to effectively devalue the project per stage of development based on the ultimate value of the product at the conclusion of the RDC [19].
A binomial lattice based upon the Kellogg and Charnes approach is outlined in Eq 2. Whilst this methodology ultimately breaks down into a series of NPV calculations at a given go/no-go point, it allows the user to map the product life cycle by applying appropriate risk weightings at each stage of development for a detailed schematic of the decision matrix across development, where i = 1, . .., 5 is a representation of the clinical stages of development: discovery, preclinical, phase I, phase II and Phase III.[17].
In Eq 2, ρ i describes the conditional probability of stage of development i is the final stage for a product at i − 1. T is when all future cash flows = 0. DCF i,t is the cash flow at t given i is the end stage.r d the discount rate and j = 1, . .., 5 represents an index of quality (An index of 5 was chosen based on the five categories (dog, below average, average, above average and breakthrough) described by Kellogg & Charnes, 2000, which is more than sufficient for this description of a binomial lattice.).The probability of a drug being of a particular quality is q j .CCF j,t represents the expected cash flow upon commercialization (t|j) and r c is the discount rate for CCF j,t [17].
When considering either approach, what we observe is that both rely on the determination of some form of risk-adjusted NPV based on the point at which the valuation is sought.Whilst we appreciate this is an oversimplification of a diverse topic, the challenge with this approach is defining and quantifying risk.Most approaches seek to "sum up" risk.The obvious challenge with this approach is that doesn't allow for the idiosyncrasies, difficulties and market expectations on certain classes of products and treatment areas of varying impact (how do we value an early-stage acne product alongside an early-stage cancer product based upon a novel antibody using the same benefit/cost analysis paradigm without understanding the risks and the impact this can have on clinical success or survivorship?).In practice, these idiosyncrasies are accounted for by considering comparable deals (stage of development, product type, risk, etc.) to understand what the market is willing to bear for a given product in a given category [19].With the approach of this research we posit a more comprehensive approach to the intangible behavioural asymmetries that impact valuation through the addition of a proxy for market behaviour derived from public sentiment, relevant to a chosen indication or class of drug.This approach would bring a real-time understanding of the market's appetite for risk at the time the valuation is sought.

Understanding market behaviour
In recent years, a plethora of research has looked at sentiment analysis as a proxy for market behaviours [21][22][23].Many factors have driven this move towards sentiment analysis including the accessibility of rich sources of data such as Reddit, Facebook and Twitter to name just a few.The accessibility and volume of this data allow researchers the opportunity to gauge public agglomeration of opinions on very specific topics in near real-time settings.Whilst the discussions around the application of sentiment analysis are ongoing and evolving, its impact on the study of market behaviour is clear, where topics such as stock market price evaluation and Bitcoin volatility are areas examined [24,25].
Behaviour-based asymmetries refer to the disparities in how individuals or entities act or react based on varying levels of information, expectations, and sentiment, significantly influencing market behaviours and decision-making processes.These asymmetries, when applied to market behaviors, particularly in the context of RDCs underscore the importance of understanding nuanced investor sentiments and informational discrepancies to refine valuation methodologies and predict market movements with greater accuracy [8,26].
When considering biotechnology and pharmaceutical RDCs, a meaningful proxy for market behaviour could prove extremely important.Products born out of these transactions can have vast impacts on quality of life and longevity.These effects encourage engagement and buy-in from diverse stakeholder groups who may be affected by these transactions.Whilst previous work has examined the impact of knowledge asymmetries in the RDC context, little work has been done to date to examine to what extent market behaviour impacts RDC valuation methodologies.

Sentiment analysis
The use of either positive or negative sentiment is sensitive to context and serves different purposes.Negative sentiment can be used in well-developed markets where opinions on a service, product or business can be used to understand if market conditions are skewing negatively on a given topic.This can't be said for new ventures, products or services with little market exposure or visibility.In this context we look to positive sentiment to gain an understanding of broader market conditions, for example, if we don't have access to enough data for a particular cancer product, we take our examination a level higher to look at the broader industry that produces these types of products as a whole.This is supported where research looking at the airline KLM demonstrated that increased online engagement with customers through the use of social media leads to greater positive engagement in the intended audiences that engage with KLM on these platforms [27].This is our approach to this research.We look at positive sentiment in relation to the biotechnology industry from a temporal perspective to ascertain investment appetite [28].
In the given context, focusing predominantly on positive sentiment, rather than incorporating negative sentiment, is strategically justified due to several compelling reasons unique to the biotechnology sector and the nature of emerging markets.Firstly, in these nascent stages, negative sentiment, while informative, can disproportionately affect investor confidence and public perception, potentially stifling innovation and funding opportunities crucial for earlystage research and development.Furthermore, in emerging markets or for new products with limited visibility, the availability of reliable negative sentiment data can be scarce or overly skewed by isolated incidents, making it less reliable for comprehensive market analysis.The emphasis on positive sentiment allows researchers and investors to identify and foster areas of growth and innovation, serving as a barometer for market enthusiasm and potential areas for strategic investment [29,30].
Additionally, focusing on positive sentiment aligns with the objective of understanding broader industry dynamics and investment trends.This perspective is particularly valuable in assessing the industry's capacity to attract capital, drive growth, and navigate through the complexities of bringing groundbreaking solutions to market.Thus, while not dismissing the importance of negative sentiment in providing a balanced view, the strategic decision to focus on positive sentiment in this context is informed by the specific characteristics of the biotechnology sector, the nature of emerging markets, and the objective to gauge investment appetite and industry momentum.This approach enables a constructive framework for analysing and interpreting market sentiments, which is pivotal for fostering an environment conducive to innovation and investment in high-risk, high-reward sectors like biotechnology [28,31].
Integration of sentiment analysis in the pharmaceutical and biotechnology industries has revolutionised the understanding of public opinion and consumer behaviour.Recent studies have used advanced deep learning models and demographic analyses to gauge public sentiment toward COVID-19 vaccines, highlighting the importance of addressing vaccine hesitancy and understanding diverse public perceptions [32].Tools have also been developed that allow companies to respond quickly to consumer concerns and adapt tailored strategies for the pharmaceutical industry [31].Sentiment analysis has also shown its potential to detect adverse drug reactions, offering a novel approach to pharmacovigilance [33].These studies underscore sentiment analysis as a strategic tool in the pharmaceutical and biotechnology sectors, helping to monitor in real time the early detection of health concerns and facilitating targeted communication strategies, thus significantly influencing public health outcomes, corporate strategies, and public opinion in the validity of the application of sentiment analysis to complex research.

Hypothesis
If the heterogony in RDC deals is present, we would expect to observe distinct latent classes separating RDC transactions impacted by public sentiment from those that are not when considering RDC deal data (H 1 ).We use positive sentiment taken from ten years of Twitter micro-blogs relating to biotechnology and the life sciences as a proxy for market behaviour to capture RDCs that look to the market and comparables to benchmarks RDC valuations as put forward in the current literature.These assumptions extend the expectations put forward in the efficient market theories, where if knowledge asymmetries and bias are controlled, then pricing models should fully reflect this information in the price, and practitioners should only look to market behaviour if these elements remain otherwise uncontrolled and not effectively managed or captured [14].
Hypothesis 1. H 0 -RDC deal values show no impact from the application of sentiment as a proxy for behavioural-based market asymmetries.H 1 -We observe that positive sentiment as a proxy market behaviour is observed where at least one latent class is aligned with the application of our proxy.

RDC deal data
A sample of RDC data was taken from the Current Partnering database [34].The Current Partnering database captures publicly disclosed deal data for research and development collaborations between biotechnology firms and pharmaceutical companies.Specifically, we were interested in the total deal value and the stages of development at which the deals were entered into.Our sample spanned RDCs from January 1 st 2010 through to June 30 th 2021.We down-sampled RDC data to weekly intervals, taking the first observation in a given week, to ensure appropriate scale from a temporal perspective for this research, and in line with other research of this nature.An obvious concern with data of this type is survivourship bias [9].Our research failed to find any datasets that capture failed RDCs.Understanding the challenges this phenomenon represents, careful consideration was given to model specification and the application of appropriate control variables to ensure an accurate representation of this field in practice.
Survivourship bias in RDCs skews the analysis towards surviving entities, overlooking the failures that could provide critical insights into the challenges and barriers these collaborations face.To mitigate this bias, a broader methodological approach includes a carefully designed study to indirectly infer the impact of non-surviving entities and enhancing the robustness of conclusions.However, this approach has its limits, as the absence of direct data on failed RDCs inherently constrains the depth of analysis possible, emphasising the need for more comprehensive data collection strategies in future research.

Sentiment analysis data
Sentiment analysis drawn from Twitter data is becoming more commonplace in economic research [35,36].The use of Twitter data as a proxy for market behaviour is gaining broader academic acceptance as it continues to prove valuable in difficult-to-understand market settings [37].Although sentiment analysis is gaining a stronger foothold, the debate is still vigorous around the techniques that should be used to produce a meaningful methodology [38].
Twitter data is captured from micro-blogs in the form of "tweets".When our data were sampled, tweets were 280 characters in length, including text, audio and video.Since completing the research, this character limit has increased to 10,000 characters per tweet (https://www.theverge.com/2023/4/14/23683082/twitter-blue-10000-character-limit-bold-italic-featuressubstack-newsletter).Twitter users rely on metadata searches to find topics of interest.These metadata tags are referred to as "hashtags" due to the use of the "#" symbol as a preface for any metadata tags attached to the tweet.We used an academic account on Twitter to perform hashtag searches on all tweets of interest going back to the beginning of 2010.The Twitter database extends back to its inception in 2006.Python code was produced to sample n = 10,000 tweets per week through the Twitter API from the 27th of November 2009 through to the 6th of June 2021.The total sample size was n = 5,606,429 tweets for the period described.
Tweetroot was used to produce the primary search terms for the research.Tweetroot is a word cloud production tool that samples like terms from Twitter to refine hashtag searches.Word clouds were seeded with the following terms: "biotech", "biotechnology", "Biotech" and "Biotechnology".After multiple iterations, n = 22 hashtags were captured that were common to each word cloud and the initial search terms.

Methodology
We seek to understand if behavioural biases can be observed in the analysis of latent classes in RDC deal data.We do this through the application of finite mixture model analysis to total deal values in biotechnology and pharmaceutical RDCs.Mixture models are widely used in a myriad of disciplines, from marine mammal movement, to hospitalisation duration, to the study financial markets, and are considered robust where data is heterogeneous [39,40].As the effect we seek to observe is behavioural, we include positive sentiment taken from microblogging on the Twitter platform as a proxy for market sentiment.We include sentiment along with the clinical stage of development (Discovery, Preclinical, Phase I, Phase II and Phase III) to provide a detailed analogue of the features which influence RDC deal-making/ doing.Along with this primary model, we also produce several models to explore what impact, if any, the frequency of positive, negative and neutral sentiment tweets per week may have on the models output.In addition to this, we also explore what impact the strength, or magnitude, of the sentiment score may have on the output of the model.
In addressing the challenge of reverse causation in temporal data analysis with finite mixture models, a pivotal strategy is to use each identifiable event only once in the analysis.This approach is specifically effective in scenarios where sentiment or similar factors could influence the lead-up to an event but not its after.By structuring the dataset in this manner, each event is treated as a unique entity, ensuring that any sentiment-driven effects are confined to their temporal context [41].
6.1 The model 6.1.1Specification.As appropriate for this research, we used a conditional model where the conditional probability of an observation in a continuous feature vector is found in ith latent class given our y vector.We do this through the application of the logistic finite mixture model (FMM) described in Eq 3, where f(y) is the density for deal value, assumed to be Gaussian and conditioned upon the covariates in x.With k = 1, . .., K mixing components, each has its own coefficient vector of β k , which is the feature that captures parameter heterogeneity.We employ K = 3 mixing components, where the probability of latent class allocation p k (z, θ) is given by a multinomial logit, with covariates z and parameter vector θ.This is a simple specification based upon n = 3 latent classes that allow for differences in response to our variables of interest.Our model is then estimated through maximum likelihood, with the results in the following section.
Typically, an FMM captures multiple density functions in a probabilistic setting, where the impacts on the y variable come from distinct latent classes (g) (In its simplest form, an FMM would employ K = 2 mixing components.).In the model we describe, we examine three latent classes, f 1 , f 2 and f 3 , where the proportions of each density to the total sampled data are described as π 1 , π 2 and π 3 .K = 4 and K = 5 component models were also examined, yet both failed to converge, supporting the fit of the K = 3 model used (To confirm the validity of the approach and the application of positive sentiment in isolation, a suite of models were examined.These models included each polarity of sentiment in isolation (positive, negative and neutral), pairwise groupings of each sentiment class and finally a combination of all three.These permutations were all tested at K = 4 and K = 5.All models failed to converge, providing confidence in methodology chosen.).The marginal likelihood is described as follows: where θ is a vector of the model parameters.The y vector captures the observed response variable of RDC deal value and x is the vector of independent variables, which include positive sentiment and binary variables for the clinical stage of development.c 0 is the vector of latent class indicators (c 1 , . .., c g ), which when c i = 1 all remaining observations of c are equal to zero.π i is the probability of the ith latent class and f i (�) describes the conditional probability density function in the ith class.The joint density function for a given observation is f ij ðy ij jx; θÞ and we model the dependence of y ij on x through the following linear assumption of where β ij is the vector of coefficients for y ij and thus f i (y|x, θ) can be specified as f i (y|z i , θ) and The likelihood for a given observation is described in Eq 8.
6.1.2Probabilities for latent class allocation.Probabilities for latent classes are produced using multinomial logistic distribution where the ith class probability is described as or more simply, in our case, if we were to use a full mixture model we would observe where the probability of an observation being in a given class is described in Eqs 9 & 10.As γ 1 is treated as the base then γ 1 = 0.The ith class linear prediction is and γ i is the vector of coefficients, therefore, vector θ is derived from γ i as the vector of coefficients for the ith latent classes and β ij is the vector of coefficients for y ij .

Expectation maximisation.
Prior to maximising likelihoods, the starting values are refined using an expectation maximisation (EM) algorithm.EM is an iterative method used to estimate parameters in models that are examining unobserved latent classes.The EM algorithm operates in two phases, expectation (E) and maximisation (M), to compute the expected log-likelihood which is defined as where the log-likelihood is expressed as and to realise the form of the expected log-likelihood we look to the linear function of the expectation (E) step as In this scenario we describe the posterior probability as p i , with the log-likelihood for an observation being where Q(θ|θ (k) ) is a function of θ (k) through p i , and the maximisation (M) step maximises Q(θ| θ (k) ) considering θ to find θ (k+1) .

Regression for latent class predictions.
Within latent class predictions are made with a standard linear regression model using where b m i is the predicted mean of y in the ith latent class and e p i is the predicted posterior probability for the ith latent class.

Software
This research used STATA 17, a robust statistical software package, to perform the modelling outlined in this methodology.A Logit Finite Mixture Model was specified to estimate the probability of varying levels of deal valuation based on public sentiment and the development stage of the RDCs.

Results
As discussed in later in section 6, we considered the impact of positive sentiment derived from biotechnology and pharmaceutical-specific tweets sampled from Twitter's database along five stages of clinical development (Discovery, Preclinical, Phase I, Phase II and Phase III) to examine if latent classes are present in RDC deal value based upon sentiment used as a proxy for market behaviour.88% (534 of 602) of the tweets used in this research are significantly positive with an α of 5% with a p-value of 0.000.This intensity was confirmed with a pairwise t-test that compared positive and negative sentiment per week.The terms captured for the search are described in Table 1.The descriptive statistics for these variables can be found in Table 2.As can be seen, Total Deal Value (log) and positive sentiment are continuous variables, with the clinical stages of development dummy, binary variables.Log conversion was applied to the Total Deal Value data to address the data's right skew.This transformation aims to linearise the data and stabilize the variance.
A histogram of the primary dependent variable of Total Deal Value and a mixed density plot of the three latent classes were produced and overlaid in Fig 1 for fit.As can be observed, both plots display multimodal attributes as expected.Two peaks can be seen in the left tail of the distribution in the Total Deal Value data, which is reflected in the abnormal left tail of the mixed densities plot.Considering the distribution and the heterogeneous composition of the data, a typical OLS approach would be unsuitable.Both of these plots describe data that would otherwise be considered heterogeneous and non-Gaussian when considered in its entirety, supporting the exploration of the data through the application of a mixture model as described.
Based on the assumptions taken from initial observations of the data, a three-component model was considered most appropriate as discussed in section 6 of this manuscript.Table 3 presents the results of the three-component, finite mixture model.A multinomial logit was employed to predict class membership to ensure the model accounts for unobserved heterogeneity.This was chosen as a logit-based FMM is better suited to modelling class allocation of binary variables such as stage of development used in this research.Total Deal Value is used to specify the class probability for the logit FMM model.Latent classes 2 & 3 are significant when considered against the base outcome at an α of 5%.Class 2 & 3 display a downward characteristic in their coefficients with values of -8.6953 and -6.2622 respectively.Clusters were analysed and silhouette scores produced where K2 scored 0.68 and K3 0.80.Both clusters demonstrate solid cohesion and meaningful separation.
Table 4 displays the outputs of our base outcome, or latent class 1. Latent class 1 finds that positive sentiment is insignificant against Total Deal Value.When we examine the dummies for clinical stage of development, we see that all stages of development are significant at an α of 1%.The impact of each of these stages of development appears to be similar in their impact on Table 1.Hashtag (#) search terms.Word clouds were produced using Tweetroot using the terms-"biotech", "biotechnology", "Biotech" and "Biotechnology".A total of n = 22 terms were captured that were common to each word cloud and the initial search terms.Table 3. Finite mixture model.Table 3 captures the results of the three component model outlined in the methodology, where a multinomial logit was used to predict class membership based on Total Deal Value.The finite mixture model demonstrates a downward trend in total deal value from class 1.This effect is most prominent in class 2 with a coefficient of -8.6953 on total deal value, -6.2622 for total deal value in class 3. Total Deal Value.Posterior probability estimations find that latent class 1 exclusively captures RDCs that took place at the Discovery stage.

Coefficient
In Table 5 we observe behaviour that is holistically different to that observed in latent class 1.Where latent class 1 captured observations at the Discovery stage of clinical development, latent class 2 is discrete also in its representation of Preclinical RDCs.Positive sentiment is significant at an α of 1%, and the magnitude of the effect at 29.865 inspires confidence in the allocation of this class.Preclinical, Phase I and Phase III are significant at an α of 5%.Contrary to latent class 1, these three stages of development trend downward against Total Deal Value.
When considering Table 6 we observe that latent class 3 caters exclusively to deals in clinical development-Phase I, Phase II & Phase III.Like latent class 1, positive sentiment fails to demonstrate significance for these stages of development, and like latent class 1, all binary independent variables for stage of development are significant at an α of 1%.The magnitude of these coefficients whilst meaningful, are less prominent than those observed for the discovery stage of development dummies in latent class 1, ranging from 4.676 to 5.003.
These results find three distinct sub-populations within the data studied.Latent class 1 captures RDCs at the earliest stage of development.Latent class 2 captures RDCs that have survived initial proof of concept and are otherwise ready to progress toward the more complex clinical trials, but still otherwise in the laboratory.Latent class 3 agglomerates all clinical stage (Phase I, Phase II & Phase III) RDCs.In the next section, we discuss what these findings mean and how researchers and practitioners can use this information when considering RDC in future settings-academic and commercial alike.We considered that there may be an impact imparted on the approaches chosen from both the frequency of the tweets per sampling period along with the strength, or magnitude, of the sentiment score.These examinations did not produce meaningful results, which confirms the approach described.The results of these models can be found in Appendix 5.2.7), Latent class 1 captures RDCs classed as preclinical.These results demonstrate that these early transactions with long lead times appear to be otherwise unaffected by sentiment due to the lead times for these products to reach the market.

Discussion
Various research considering RDC in practice has looked at the use of comparables to manage for knowledge asymmetries and bias in RDC market behaviour [42,43].Given the diversity of both the products in development and the stage at which the products are transacted, along with the rapid evolution of the industry since the early 1990s, the field requires a more astute means of holistically valuing biotechnology and pharmaceutical RDCs [44].This research has explored a method to provide a quantifiable measure of sentiment analysis as a proxy for market behaviour which can be used to mitigate the challenges faced by practitioners as they seek to understand market behaviour in the RDC valuation paradigm.Using a comprehensive and representative dataset, we find that public sentiment surrounding the biotechnology and pharmaceutical industries used as a proxy for market behaviour can be used to classify transactions that are most likely to be impacted by market behaviours.As outlined earlier, This research finds that deals at the Preclinical stage of development are impacted by positive sentiment.Our empirical analysis demonstrates that the data relating to RDCs is multimodal, which supports the use of an FMM to preserve the inherent heterogeneity within the dataset.To this end, a conditional (multinomial logit) model with three components was specified for our research.
One of the most interesting outcomes of this research is the discrete allocation of observations to latent classes based upon stage of development.As observed in Table 7, all Discovery stage observations are captured by latent class 1. Latent class 2 captures all Preclinical stage observations and all remaining deals in the various stages of clinical development are found in latent class 3. Sentiment as a proxy for market behaviour also appears to be discrete in that it appears to impart an effect on Preclinical RDCs exclusively.We posit this is due to the information available to the public and the de-risking of projects as they progress through clinical trials.In the context of biotechnology and pharmaceutical RDCs, a large proportion of Table 6.Latent class 3. Latent class 3 makes up the balance of transactions observed and are exclusive to clinical development (clinical trial phases I, II & III).Unlike latent class 2 however, these transactions also appear to be unaffected by sentiment.Of note, a large portion of these transactions are in phase I, which is also considered early development, and we consider this class capturing transactions that don't fall into traditional assumptions of "early development".Discovery deals have little IP protection and publication on the underlying technology is limited.This generally means that discovery stage RDCs have little to no public disclosures for the market to rely upon [45].With no disclosure, public sentiment can not be relied upon, and as we observed, there is insignificant impact from positive sentiment at the Discovery stage.In clinical products, RDCs are less impacted by market behaviour and knowledge asymmetries are lessened as disclosures increase and the products and their inherent risks are better understood at a fundamental and commercial level [46,47].This allows for more traditional valuation methodologies to hold true in clinical-stage RDCs.With these effects in mind, we find that preclinical deals see a meaningful impact from market behaviour.We believe this is because preclinical products have had enough public disclosure in the form of news and publications to allow the market to form an opinion on the veracity of the claims being made.At the same time, the drugs lack the clinical development needed to offset the knowledge asymmetries as we observe in the clinical products captured in latent class 3. We believe this is why we see such a significant of impact positive sentiment in latent class 2 alone.

Conclusion
The key points of this research are as follows: 1. Objective and Methodology: The study aimed to investigate the use of positive public sentiment as a proxy for market behaviours and to understand what impact this has on RDC valuations.

Model Efficacy:
The model effectively captured the impact of positive sentiment on RDCs and the grouping of knowledge asymmetries based on development stages in each latent class.

Significance of Research:
The findings validate the use of positive public sentiment as a proxy for market behaviour in preclinical RDCs.This is significant as it addresses the challenges of valuing biotechnology and pharmaceutical products, given the complex market behaviours.

Acknowledgment of Limitations:
We recognise the potential influence of survivourship bias and other external factors not included in the study, suggesting a cautious interpretation of the results.
This research sought to examine the use of positive public sentiment as a proxy for market behaviours and knowledge asymmetries present in RDCs in practice.Whilst many methods exist for the valuation of biotechnology and pharmaceutical products, the challenges stemming from the understanding of broader market behaviours persist.Our research is impactful as our findings support our choice of proxy for market behaviour for RDCs at the Preclinical stage of development.The latent groups we observe as part of the outputs of this research support the use of finite mixture analysis when considering the discrete nature of these transactions.Whilst the model demonstrated efficacy in capturing RDCs impacted by positive sentiment as a proxy for market behaviours, it also captures the expected grouping of knowledge asymmetries based upon the stage of development in each Latent class.
Although we cannot rule out the effects of survivorship bias or other exogenous factors not considered by this research, we consider the nature of the outcomes of this research compelling, and a justification for further research in this evolving space.As one would expect, the very challenges facing practitioners valuing RDCs impact researchers examining the phenomenon.Further research would benefit from an understanding of the factors that impact RDCs who don't find RDC partners and otherwise fail to meet clinical milestones due to a lack of partnership.Understanding these biases could go a long way to shape and inform future research.
Our research has gone beyond the typical understanding of RDC value and provides an understanding of how practitioners and policymakers could benefit from a more holistic understanding of market conditions and their inherent impact on RDC valuations.Future work could look to explore the impact of sentiment on RDCs at a firm level, more specifically, how firm-level disclosures impact sentiment and thus RDC values throughout the product development lifecycle.

Fig 1 .
Fig 1. Latent class density.A mixed density of the three latent classes was overlaid on a histogram of the primary Total Deal Value data to visualise if the data displayed a propensity toward being multimodal.As displayed, we see multimodal attributes in the left tails of both the histogram and the mixed densities plots making this data a good candidate for mixture modelling.https://doi.org/10.1371/journal.pone.0307116.g001

Table 2 .
Descriptive statistics.Descriptive statistics for the features used in the model described in Eq 16.As we see, the dependent variable of Total Deal Value and the independent variable of positive sentiment are continuous, whilst the independent variables representing stage of development are binary. https://doi.org/10.1371/journal.pone.0307116.t002

Table 4 .
Latent class 1.When considering the class allocation table (

Table 5 .
Latent class 2. Latent class 2 exclusively captures products that were in preclinical development when transacted as captured in the class allocation table(Table 7).These transactions demonstrate the importance of public sentiment on early-stage development.

Table 7 .
Class allocation per stage.Class allocation based upon the specified model provide a discrete understanding of how the market reacts to RDCs at varying stages of development.As we observe, discovery RDCs are exclusive to latent class 1, preclinical to latent class 2 and the clinical RDCs to Latent class 3. https://doi.org/10.1371/journal.pone.0307116.t007